{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neural Network (MLP) from scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Key notes:\n",
    "\n",
    "- a network *\"learns\"* by modifying its weights. Training a neural network means **finding the right **weights and biases** so the network can solve the problem**. I.e. **minimizing a loss function**\n",
    "<br><br>\n",
    "- Think of a neuron as a function, that takes the activations of ALL neurons in the previous layer, and outputs a number \n",
    "<br><br>\n",
    "- The activation of a neuron is a measure of how \"positive\" the relevant weighted sum is.\n",
    "<br><br>\n",
    "- The bias is simply a value that lets us choose when a neuron is meaningfully active. Think: \"bias for inactivity\". Ex. only want neurons with a weighted sum > 10 to be activated, set the bias = -10.\n",
    "<br><br>\n",
    "- The weighted sum in a neural network represents the combined influence of all input neurons (i.e., neurons from the previous layer) on a single neuron in the current layer.\n",
    "<br><br>\n",
    "    - It signifies the strength of a connection, determining how much impact each \n",
    "    input has on the neuron's output and its potential to activate based on the \n",
    "    weighted input signals."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Structure\n",
    "\n",
    "- **Forward pass**: the process of passing the input data through the network to compute the output (predictions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Neuron:\n",
    "    def __init__(self):\n",
    "        self.activation = None\n",
    "        self.weight = None\n",
    "        self.bias = None\n",
    "\n",
    "    def set_activation(self, value: float) -> None:\n",
    "        \"\"\" Set the activation value of the neuron \"\"\"\n",
    "        self.activation = value\n",
    "\n",
    "    def set_bias(self, value: float) -> None:\n",
    "        \"\"\" Set the bias of the neuron \"\"\"\n",
    "        self.bias = value\n",
    "\n",
    "    def get_bias(self) -> None:\n",
    "        \"\"\" Return the bias of the neuron \"\"\"\n",
    "        return self.bias\n",
    "\n",
    "\n",
    "class Layer:\n",
    "\n",
    "    # num_inputs : number of neurons in the previous layer\n",
    "    def __init__(self, num_inputs: int, num_neurons: int):\n",
    "        \"\"\" Initialize a layer.\n",
    "            - num_inputs: number of neurons in the PREVIOUS layer.\n",
    "            - num_neurons: number of neurons in the CURRENT layer.\n",
    "        \"\"\"\n",
    "        self.neurons = [Neuron() for _ in range(num_neurons)]       # list of neuron objects\n",
    "        self.weights = np.random.randn(num_inputs, num_neurons)     # matrix of (initially random) values/weights. shape (num_inputs, num_neurons)\n",
    "        self.biases = np.zeros(num_neurons) # bias vector\n",
    "\n",
    "    def set_activations(self, inputs: np.ndarray) -> None:\n",
    "        \"\"\" Calculate the weighted sum for a neuron in the layer \n",
    "            (this is NOT the neuron's activation value). \"\"\"\n",
    "        self.activations = np.dot(inputs, self.weights) + self.biases\n",
    "    \n",
    "    def apply_activation_function(self) -> None:\n",
    "        \"\"\" Apply the activation function (ReLU) to the weighted sum. \n",
    "            This is the value of the neuron's activation. \"\"\"\n",
    "        self.activations = self.relu(self.activations)\n",
    "\n",
    "    def get_activations(self) -> np.ndarray:\n",
    "        \"\"\" Return the activations of the layer. \"\"\"\n",
    "        return self.activations\n",
    "    \n",
    "    @staticmethod\n",
    "    def relu(x: np.ndarray) -> np.ndarray:\n",
    "        \"\"\" \n",
    "        Apply the rely activation function. \n",
    "        If the input value > 0, return the input value.\n",
    "        If the input value == 0, return 0.\n",
    "        \"\"\"\n",
    "        return np.maximum(0, x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "#### Key notes:\n",
    "\n",
    "- **Loss function**: returns how inaccurate the network's outputs are\n",
    "<br><br>\n",
    "- **Gradient descent**: an algorithm to minimize the loss function. goal: find global minima of the loss function\n",
    "    - i.e. find the network's parameters (weights and biases) that minimize the loss function (i.e. minimize the network's output errors)\n",
    "    <br>\n",
    "    - To calculate **gradients**: find the derivative of the loss w.r.t. network output\n",
    "        - $\\frac{\\partial \\text{loss}}{\\partial a} = \\frac{a - y_{\\text{true}}}{m}$ \n",
    "        <br>\n",
    "        where m is the number of examples\n",
    "<br><br>\n",
    "- **Epoches**: **one complete pass** through the entire training dataset\n",
    "<br><br>\n",
    "- **Learning rate:** hyperparameter that controls how big a step you take when updating your network's parameters (weights and biases) during gradient descent\n",
    "    - **too high**: a high learning rate might cause training to overshoot the minimum of the loss function\n",
    "    - **too low**: a low learning rate can make the training process very slow or stuck in local minima\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# loss function: mean squared error (mse)\n",
    "def loss(y_pred: np.ndarray, y_true: np.ndarray)->float:\n",
    "    \"\"\" Determines how correct the network's predictions are. \"\"\"\n",
    "    return 0.5 * (y_pred - y_true)**2   # coefficient of 1/2 to make derivative cleaner\n",
    "\n",
    "# calculates gradients\n",
    "def loss_derivative(y_pred:np.ndarray, y_true:np.ndarray)->float:\n",
    "    \"\"\" Compute derivative of the loss function w.r.t. predictions. I.e. the gradients \"\"\"\n",
    "    return y_pred - y_true\n",
    "\n",
    "# derivative of the activation function\n",
    "def relu_derivative(z: np.ndarray) -> np.ndarray:\n",
    "    \"\"\" Calculate the derivative of relu: returns 1 for positive values, 0 otherwise \"\"\"\n",
    "    if z > 0:\n",
    "        return 1\n",
    "    return 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Functions for gradient descent\n",
    "\n",
    "def compute_gradients(layer: Layer, input_data: np.ndarray, target_outputs: np.ndarray) -> tuple:\n",
    "            \"\"\" Compute gradients. Tells network how to incrementally minimize loss function \"\"\"\n",
    "            \n",
    "            # compute the gradient (loss derivative) w.r.t activations\n",
    "            dA = loss_derivative(layer.get_activations(), target_outputs)   \n",
    "\n",
    "            # compute the gradient (loss derivative) w.r.t weights using matrix multiplication\n",
    "            dW = (1 / input_data.shape[0]) * np.dot(input_data.T, dA) \n",
    "\n",
    "            # compute gradient w.r.t biases\n",
    "            dB = (1/ input_data.shape[0]) * np.sum(dA, axis=0)\n",
    "\n",
    "            return dW, dB\n",
    "\n",
    "def update_parameters(layer: Layer, dW: float, dB: float, learning_rate: float):\n",
    "        # update weights\n",
    "        layer.weights -= learning_rate * dW\n",
    "\n",
    "        # update biases\n",
    "        layer.biases -= learning_rate * dB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Backward process (backpropagation)\n",
    "\n",
    "The algorithm for determining how a single training example would nudge the network's weights and biases, in terms of relative proportions to those changes which would give the most rapid decrease in the cost."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Complete MLP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "\n",
    "# ------- Useful Functions -------\n",
    "\n",
    "# relu (activation function)\n",
    "def relu(x: float) -> float:\n",
    "    \"\"\" Returns x if x > 0, else 0\"\"\"\n",
    "    return np.maximum(0, x)\n",
    "\n",
    "def relu_derivative(x: float) -> float:\n",
    "    \"\"\" Returns 1 if x > 0, else 0\"\"\"\n",
    "    return np.where(x > 0, 1, 0)\n",
    "\n",
    "# loss function (mse)\n",
    "def mse(y_pred, y_true)->float:\n",
    "    \"\"\" Returns how accurate the network's predictions are \n",
    "    compared to expected output \"\"\"\n",
    "    return np.mean(0.5 * (y_pred - y_true)**2)\n",
    "\n",
    "def mse_derivative(y_pred, y_true):\n",
    "    # note: dividing by number of examples for gradient averaging\n",
    "    return (y_pred - y_true) / y_true.shape[0]\n",
    "\n",
    "\n",
    "# ------- Layer structure -------\n",
    "\n",
    "class Layer:\n",
    "    def __init__(self, num_inputs, num_neurons):\n",
    "        # initialize layer with random weights and zero biases\n",
    "        self.weights = np.random.randn(num_inputs, num_neurons)     # 2d array (num_inputs, num_neurons)\n",
    "        self.biases = np.zeros((1, num_neurons))                    # effectively a vector (2d array with shape (1, num_neurons) )\n",
    "\n",
    "    def forward(self, inputs):\n",
    "        \"\"\" Forward process \"\"\"\n",
    "        # compute weighted sum (z) and activations (a)\n",
    "        self.inputs = inputs    # store inputs for use in backprop\n",
    "        self.z = np.dot(inputs, self.weights) + self.biases  # weighted sum\n",
    "        self.activations = relu(self.z)     # apply activation function\n",
    "        return self.activations\n",
    "    \n",
    "    def backward(self, dA, learning_rate):\n",
    "        \"\"\" Backpropagation / backward process \"\"\"\n",
    "        # compute derivative of activation function\n",
    "        dz = dA * relu_derivative(self.z)\n",
    "        # compute gradients for weights and biases\n",
    "        dW = np.dot(self.inputs.T, dz)\n",
    "        dB = np.sum(dz, axis=0, keepdims=True)\n",
    "        # compute gradient to pass to previous layer\n",
    "        dinputs = np.dot(dz, self.weights.T)\n",
    "        # update parameters\n",
    "        self.weights -= learning_rate * dW\n",
    "        self.biases -= learning_rate * dB\n",
    "        return dinputs\n",
    "    \n",
    "\n",
    "# ------- Network structure -------\n",
    "\n",
    "class Network:\n",
    "    def __init__(self, input_dim, hidden_dim, output_dim):\n",
    "        # initialize network layers\n",
    "        self.hidden_layer = Layer(input_dim, hidden_dim)\n",
    "        self.output_layer = Layer(hidden_dim, output_dim)\n",
    "\n",
    "    def forward(self, x):\n",
    "        # forward pass through the network\n",
    "        self.hidden_activations = self.hidden_layer.forward(x)\n",
    "        self.output_activations = self.output_layer.forward(self.hidden_activations)\n",
    "        return self.output_activations\n",
    "    \n",
    "    def backward(self, x, y, learning_rate):\n",
    "        # perform a forward pass\n",
    "        y_pred = self.forward(x)    \n",
    "\n",
    "        # compute loss derivative at the output\n",
    "        dLoss = mse_derivative(y_pred, y)  \n",
    "\n",
    "        # backpropagate through output layer\n",
    "        dHidden = self.output_layer.backward(dLoss, learning_rate) \n",
    "\n",
    "        # backpropagate through hidden layer\n",
    "        self.hidden_layer.backward(dHidden, learning_rate) \n",
    "\n",
    "    def train(self, x, y, epochs, learning_rate):\n",
    "        # training loop\n",
    "        for epoch in range(epochs):\n",
    "            y_pred = self.forward(x)\n",
    "            loss_val = mse(y_pred, y)\n",
    "            self.backward(x, y, learning_rate)\n",
    "            if epoch % 100 == 0:\n",
    "                print(f'epoch {epoch}, loss: {loss_val}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 0, loss: 0.4750603820847576\n",
      "epoch 100, loss: 0.441562207673828\n",
      "epoch 200, loss: 0.43494305406359274\n",
      "epoch 300, loss: 0.43316537643373715\n",
      "epoch 400, loss: 0.4319842778344139\n",
      "epoch 500, loss: 0.4311104397661298\n",
      "epoch 600, loss: 0.4299355672965768\n",
      "epoch 700, loss: 0.42835925122788915\n",
      "epoch 800, loss: 0.42714772313095556\n",
      "epoch 900, loss: 0.42603610675377207\n"
     ]
    }
   ],
   "source": [
    "# example usage:\n",
    "if __name__ == '__main__':\n",
    "    # create some dummy data: x as inputs and y as target outputs\n",
    "    x = np.random.randn(100, 3)    # 100 examples, 3 features each (thus 3 inputs)\n",
    "    #print(x)\n",
    "    y = np.random.randn(100, 1)    # 100 target outputs\n",
    "\n",
    "    # initialize the network (3 inputs, 4 neurons in hidden layer, 1 output)\n",
    "    net = Network(input_dim=3, hidden_dim=4, output_dim=1)\n",
    "    \n",
    "    # train the network\n",
    "    net.train(x, y, epochs=1000, learning_rate=0.01)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the loss is consistently decreasing during training, it's generally a good sign that the network is learning."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Recommended Resources\n",
    "\n",
    "-  [3Blue1Brown Playlist](https://www.3blue1brown.com/topics/neural-networks)\n",
    "- [ML cheatsheet](https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html)\n",
    "- [Kaggle Intro to NN](https://www.kaggle.com/code/ryanholbrook/deep-neural-networks)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ML-from-scratch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
